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Abstract 

We compare different conjugate gradient - like matrix inversion methods (CG, Bi- 
CGstabl and BiCGstab2) employing for this purpose the compact lattice quantum 
electrodynamics (QED) with Wilson fermions. The main goals of this investigation 
are the CPU time efficiency of the methods as well as the influence of machine 
precision on the reliability of (physical) results especially close to the 'critical' line 



1 Introduction 

In many physical applications it is necessary to perform the inversion of some large 
N X N matrix A4, i.e., to solve the equation 

M-X = ^, (1) 

where is some known input-vector and X the required solution. It becomes 
not an easy computational task to calculate the vector X when, for example, 
A^^IO^ especially when A4 is not well-conditioned. The problem of efficiency 
and reliability of the inversion algorithm appears to be of crucial importance. The 

* Work partly supported by EEC-Contract CHRX-CT92-0051 
^Permanent address: Joint Institute for Nuclear Research, Dubna, Russia 
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lattice approach to gauge theories (e.g., QED or QCD) which provides a powerful 
tool for the numerical study of quantum field theories in the nonperturbative regime 
is a typical example. In the case of the U{1) gauge group and Wilson fermions 
(QED) the matrix to be inverted is [|l| 



M{U;k) = i-K-Q{U), 

4 
M=l 

where x, y denote the sites of a 4d lattice, fi is the unit vector in the direction 
/i , and 7^'s are 4x4 Dirac's matrices in euchdean space. The set of gauge field 
hnk variables {^4^} ; G U{1) generated with the proper statistical weight 

defines the corresponding gauge field configuration, and the hopping parameter k 
is related to the bare mass of the fermion . The dimension of the fermionic matrix 
is X with = 4 ■ Ngitgs, and the typical number of sites in lattice calculations 

is NsUes ~ 8^ - 20^ . 

In our study we were especially interested in the physically most interesting - 
and technically most difficult - 'extreme' case, when the minimal eigenvalue Xmin 
of the fermionic matrix A4 tends to zero. In the case of Wilson's fermionic 
matrix this limit is connected with the special choice of the hopping parameter 
K, —> KciP), P being the gauge coupling parameter, and is supposed to correspond 
to the chiral limit of the theory (see, e.g., [Q]). By approaching the chiral limit within 
the confinement phase the most common matrix inversion methods like conjugate 
gradient (CG) or minimal residue (MR) become extremely inefficient or fail (for 
review see, e.g., [Q) according to the fact that the condition number of the fermionic 
matrix diverges. 

Recently new very promising cg-like matrix inversion algorithms have been pro- 
posed : BiCGstabl 0] and BiCGstab2 The successful application of BiCGstabl 
to lattice gauge theory issues has been demonstrated in for intermediate quark 
masses, i.e., for k's sufficiently below ndP)- In this case the CPU time improve- 
ment factor was reported to be of about 2-^3 compared to CG. Similar improvement 
was reported also for the BiCGstab2 method ||. However, the problem of the 
efficiency and reliability of these methods in the Xmin limit deserves further 
study. A delicate point to be studied is the possible breakdown because of the 
accumulation of roundoff errors. This accumulation becomes much more dangerous 
when approaching the chiral limit and/or when the precision of the used machine 
is not large, as in the case of APE/Quadrics. On this machine, the native single 
precision [] arithmetic can be improved using accurate summation methods but 
which cannot fully compete with hardware supported double precision and require 
more computational power. Another interesting question which did not attract still 



32bit precision 
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sufficient attention is the the connection between the convergence of the residue 
vectors and the convergence of observables. As it will be shown, this connection 
can be rather nontrivial. 

The above mentioned topics constitute the main goals of this work. In the 
second section we give a short description of the inversion methods under study 
and main notations. The third section is devoted to the reliability of the inversion 
methods, while in section four we present the CPU time improvement factors. The 
last section contains our conclusions. 



2 Methods and observables 

The structure of the fermionic matrix Ai in eq.(0) permits an optimization based 



on an even-odd decomposition [|T0], 111 



where subscripts e and o stand for the even and odd subspace, correspondingly. 
Therefore, it is sufficient to invert, say, the matrix Ve 

defined only on the even subspace in order to solve the problem (|I|) since the solution 
vector of the complementary subspace can be easily constructed from the solution 
Xg. Throughout this work we refer to the even-odd preconditioned versions of the 
investigated algorithms. 

We made a comparative study of the standard conjugate gradient method (CG), 
biconjugate gradient stabilized (BiCGstabl) and biconjugate gradient stabilized II 
(BiCGstab2). These methods approximate the true solution iteratively by gener- 
ating a sequence Xq, Xi, . . . , Xk, . . ., where Xq denotes the starting vector which 
can be chosen with different methods, for example in a random way. For complete- 
ness we give a short schematic description of these methods. To keep the notation 
simple we drop the subspace index in the algorithmic part of the formulas. 

• CG. The equation to solve is VXf, = ip" = V^ip'^ = T>\{Lpf, — nQeo'^o), with 

V = vlv,. 

P^ = R, = p>" - VXo 
For k = 0,... : 

Rk+l = Rk — ^kT^Pk 

Pk+l = Rk+l + Ik+lPk ] Ik+l = {Rk+l, Rk+l) / {Rk, Rk) ■ 

This method is simple in implementation and reliable, though it is not the 
fastest. In the version with double precision we use it as a reference point for 
the other two methods. 
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• BiCGstabl. The equation to solve is VX^ — </?e) with V — Vg. 
"0 = ^0 = Po = 1 

For A; = 1, . . . : 

Pk — {Ro,Rk-l) l^k — {PkOCk-l) / {Pk-l^k-l) 
Pk — Rk-1 + Pk{Pk-l — ^k-lVk-l) 

Vk = VPk ; ak = Pk/{Ro: Vk) 
Sk — Rk-i — oikVk 



• BiCGstab2. The equation to solve is VXe — (fi'^, with V — Vg. 
Ro^Ro^ip'- VXo 

choose Yq such that 6o — {Yq, Rq) and a;^"^ = (yo, ^-^o)/<^o 7^ 
For A; - 0, . . . : 

Vfc+i = Uk- uJkVTk , k>l 

Uk+i = Rk — ^^kT^Rk 
If k even then m := k/2 





else m :— {k — l)/2 



Bm+i = [Uk+i-Vk+i\T^Uk+i] (2 column matrix) 




Rk+1 — (1 — Cm)Vfc+l + ^mUk+1 + VmVUk+1 

Xk+1 — (1 — ^m){Xk-l + LOk-lRk-l + ^kTk) + ^m{Xk + ^kRk) — VmUk+1 

h+1 = (^0,-Rfe+i) ; l/^k+l ^ (i^kh+l) / (VmSk) 

Rk+1 — Rk+1 — ■0fe+l ((1 — ^m)Tk + ^mR-k + V'wPRk) 



endif 



Tk+i — Uk+i — ipk+iRk ', T^Tk+1 — VUk+i — i^k+iT^Rk 



^k+i — 5k+i/{yG,'DRk+i) . 
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Compared to BiCGstabl this algorithm has more comphcated recurrences and re- 
quires more dot product operations. 

The iterative procedure was stopped when ||-Ra:|| < ^ ■ We used e = 10~^ when 
implemented on an APE/Quadrics system (equivalent to the condition | |-Rfc| | norm = 
I l-Rfcl I/I l-Rol I ~ 10^'' on an 8'^ x 16 lattice), and e = 5 ■ 10~^ on double precision 
machines. A point to mention is that the matrix to be inverted in the case of CG 
is V = 'D\T>e and hence differs from the BiCGstabl/2 cases where V = T>e. The 
application of the same stopping criterion ||i?fc|| < £ to the different algorithms 
entails a systematic error, which we have found to be O-i-2%. 

The algorithmic residue vectors E!^^" = Rk can differ substantially from the 
corresponding true residue vectors R^^^^ = f~ l^^k due to roundoff errors. Apart 
from the norms of the two residue vectors (||-Rfc™^|| and ||-Rfc'^°||) we monitored 
also for every configuration the convergence of the pion norm 11 , scalar condensate 
and pseudoscalar condensate ip'j^ip defined as : 



n = — Tr 

^-'^ sites 



^-175^-^75 ; (5) 



= .^—.Tt[M-']; ^757/> = — ^■Tr[75A^-i] . (6) 

^-^^ sites ^-^^ sites 

It is useful to write down the spectral representation of these observables 

n = 

'^■^sites n f^n 

= — Y' ' ^^5^ = — 51 — ' (7) 

^-^^ sites n '^n '^-^^ sites n l-^n 

where A„ and /i„ denote the eigenvalues of M. and 7W75, respectively. Since 
/i„ — > when the corresponding A„ — >■ 0, it becomes clear why 11 is a very 
sensible observable near Kc{i3). 

The conventional definition of the bare fermion mass ruq is 



-•(- 

2 



' 2 K,{f3)' 

To monitor the influence of the accuracy we performed our calculations on a 
Cray-YMP and Siemens-Fujitsu (default 64bit, i.e., double precision), Convex- 
C3820-ES (single and double precision) and a QH2 - 8x8x4 APE/Quadrics 
semitower. The native single precision of QH2 was improved by implementation of 
the Kahan summation method applied to global sums 

The lattice sizes we used have been 16'^ x 32 and 32'^ x 64 on the QH2, and 
8^ X 16 and 16^ x 32 on the vector machines. The choice of the gauge coupling 
parameter j3 was and 0.8 in the confinement phase and 1.1 in the Coulomb 
phase. 
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3 Reliability and precision effects 



3.1 CG in single and double precision modes 

One way to study the reliability of the inversion method is to monitor the inversion 
history of the two residues : and In the case of the double precision 

conjugate gradient method (CGDP) the norms of these two residues coincide with 
high accuracy down to very small values which ensures the reliability of the result. 
In Figure |I]a we show an example of such a run at (3 = 0.8 and k = 0.2170 on an 
8^ X 16 lattice. The value of the hopping parameter k was chosen to be rather 
close to the critical value k,c{P = 0.8) ~ 0.2171(1). 

To our experience, CGDP never fails, although can become very slow. Therefore, 
we used the double precision conjugate gradient method as a reference point for the 
two 'stabilized' methods and for CG in single precision mode (CGSP). 

To single out the effect of machine accuracy, we applied the single precision and 
double precision CG to the same configurations providing the same startvectors 
and sourcevectors (see, e.g.. Figure |l]a). The single precision residue H-Rfc™"^!! 
decouples from the other residues after ~ 1500 iterations due to lack of accuracy. 
However, the single precision algorithmic residue coincides with both 

double precision residues. In this case the pion norm 11 practically does not depend 
on the machine accuracy. It reaches its plateau before Hi?^™*^!! has decoupled from 
the other residues. 

In some cases, however, the decoupling of ||-R^'^°|| and occurs before 

the observable (e.g., the pion norm) reaches its plateau. An example of such an 
inversion history on the QH2 (single precision) for n = 0.2170 is shown in Figure 
|l|b. Despite the Kahan improved summation, starts even to increase long 

before the plateau in 11 is reached. 

Attempting to determine the reliability of the single precision CG method in 
a more confidential way we compared the solutions for our fermionic observables 
given by CGSP to those of CGDP applied to a set of 200 gauge configurations for 
various values of k in the vicinity of at /3 = on a 8^ x 16 lattice. On 
every configuration k we checked the relative deviation 

sm = , (9) 

Odp 

where O^^pgp denotes one of Tpip^ ^Ibi^, or 11 on the particular configuration 
obtained by CGDP or CGSP, respectively. Some results are given in Figure In 
the case oi k = 0.25 (see Figure 0a) the fermionic observables seem to be quite well 
reproduced by CGSP. The maximal relative deviations show up in ipj^ip, but remain 
under 3%. For the averages we obtain: (5('?/''?/')) = — 3x 10^^±4x 10~^, {S{4j'~f5ip)) = 
2 X 10-^ ± 3 X 10-^ (5(n)) = -9 X 10~^ ± 2 X lO^^ . Therefore, the average 
'IEEE-noise' coming from the comparison of single to double precision data is well- 
consistent with zero within errorbars. However, at another - somewhat lower - value 
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Figure 1: Convergence histories on single configurations on a vector machine (a) 
and QH2 (b). Inorm — \ \Rk\\/\\Ro\\- 
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Figure 2: The relative deviation S as defined in eq.(^ at k = 0.25 (a) and 
K = 0.2498 (b). 



of K (Figure ^d) CGSP provided results which significantly differed from those 
of CGDP on some configurations and hence cannot be ignored. In particular, the 
maximal is about ~ 20%. For the averages of 6 we obtain {5{'4)ip)) ^ 

-0.002 ± 0.02 , (^(V^TsV')) 0.0004 ± 0.007 and {5{Ii)) 0.0007 ± 0.09, 

which, however, are still well consistent with zero. The maximal relative deviations 
observed on other values of k have been ~ 6 39%, demonstrating that CG in 
single precision mode is not a priori reliable in the 'critical' region. Although the 
net effect of single precision on the CG method seemed to be small, i.e. the found 
averages of 5 's are compatible with zero, it might happen that the averages of 
observables suffer from single precision, especially on sets of small statistics. 

Therefore, the question about the reliability of the single precision (even Kahan 
improved) CG still remains open in the case of the confinement phase. More detailed 
study, especially on larger lattices, is necessary to draw a final conclusion here. 

Fortunately, the situation is much more favorable in the Coulomb phase. The 
inversion procedure is comparatively fast both for and K > Kc- For all 

our observables we found very reasonable agreement between single precision and 
double precision calculations. As an example, we show in Figure ^ the average 
values of the pion norm 11 calculated with single (QH2) and double precision at 
(3 = 1.1. Since most of the DP data are obtained on a smaller lattice than the QH2 
data there is a (rather weak) dependence of (11) on the lattice size in the 'critical' 
region around Hc- For the other values of k, the agreement between CGDP and 
CG on the QH2 is very good. Since the vicinity of is most interesting, we 
calculated four datapoints around with CGDP on a 16^ x 32 lattice in order 
to compare them with the QH2 results from the same lattice size (see the inset of 
Figure ^. Obviously, the QH2 implementation of CG provides correct results at 
this P value. 

3.2 Double precision BiCG— stabilized methods 

As in the previous subsection for CG we compared the 'stabilized' methods by 
measuring fermionic observables on the same configurations. As long as the hopping 
parameter k was not too close to Kc{(5) all three methods agreed with very high 
precision for all /3-values we used. However, at k — >• ndP) failures occurred even 
in the double precision mode, i.e. either the method did not converge at all or (as 
in the case of BiCGstab2) provided a solution deviating significantly from CGDP. 
As an example of non-convergence of BiCGstabl we show in Figure ^the first 4000 
iterations of one inversion process at f3 = and k = 0.2496. After 20000 
iterations we finally terminated the iterations. The norm of the residue vector 
ll^a^soji Q|/^<™e|| _ ||7^a'9o||^ (^jfj ^ot dccreasc as expected and always remained 

larger than one. Both and 11 stayed approximately constant when 

the number of iterations became larger than 2500. For comparison the dashed 
horizontal line shows the CGDP solution for the pion norm 11. Restarting the 
procedure in some cases can help to resolve the problem. However, the effect of 
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restarting seems to be rather sensible to the chosen frequency (see, e.g., 0, p. 22). 

Tables to § show values of (3 and k where failures of the different methods 
have been observed. The convergence of BiCGstabl demonstrates an interesting 
/9-dependence. While in the extreme strong coupling limit /5 = it failed on some 
configurations in the limit k i^dP), we did not observe failures at the larger 
/3-values in this limit. Failure rates of BiCGstabl will be discussed in the next 
section. 

As can be seen from these Tables BiCGstab2 appears to be less reliable when 
n ~ He- Figure |^ illustrates the situation for different /t-values at /3 = 0.8. For 
every K-value the pion norm 11 was calculated with the different methods on 
the same configuration. In the 'critical' region (lower plot in Figure 1) BiCGstabl 
matches CG with high precision, while BiCGstab2 fails. Near to Kc{i3) the number 
of failures of BiCGstab2 turned out to be very large. Therefore, we conclude that 
BiCGstab2 even in double precision mode is unsuitable for matrix inversion in the 
limit Xmin , i.e., k —>■ ndP)- The comparative complexity of this algorithm 
turns out to be rather a disadvantage for the range of investigated values of P and 
K. In particular, the calculation of the optimal coefficients and rjm (see Section 
2 and 0) can be the source of numerical instability, since it involves operations 
like extra dot products, matrix inversion and rather complicated recurrences where 
roundoff errors can dramatically accumulate. This is quite likely what happens in 
the case k Kc(/3)- 

We also performed runs at different /3's and k's in the 'critical' region to com- 
pare the averages of different fermionic observables. Apart from the pion norm and 
fermionic condensate we calculated also for every configuration k the pseudoscalar 
correlator 



rfc(r) ~ E {Tr {M-ll,M-b,) - Tr (A^^^Ts) " Tr {M~1^,) } , 



r = |X4 - y^l 



to determine the effective pion mass m^(r). For values close to Hc we used 



an estimator defined in to obtain a clear signal of mj^{T). In Figure ^ we show 
the r-dependence of the effective 'pion' mass mj^lr) at (3 = 0.8 and k = 0.2165. 
Evidently, BiCGstabl provides results well consistent with that obtained by CG. 



3.3 Single precision BiCG— stabilized methods 

This section is devoted to the comparison of the single precision and double precision 
BiCGstab methods. The main accent was made on the TAO-language realization 
of these algorithms on the APE/Quadrics machine (QH2). Since BiCGstab2 turned 
out to be unreliable (and in addition less efficient than BiCGstabl - see the next 
section) for n 's close to ndP) even in double precision we excluded this algorithm 
from further study on the QH2. 
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Figure 4: An inversion history of BiCGstabl in double precision mode at /5 = 
and K = 0.2496. The dashed horizontal line marks the CGDP solution for 11 . 
After 20000 iterations the inversion process has been aborted. 
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As an illustration, we show in Figure the inversion history of the BiCGstabl- 
run performed on the QH2 at /? = 0.8 and k = 0.214. We have chosen here 
the configuration with the maximal number of iteration steps. The accumulation 
of roundoff errors entails the ramification of the two residues and Hi?*™*^!!. 

This is similar to the case of CG in single precision (compare to Figure |ip. Note, that 
the Kahan summation which we used does not substantially increase the accuracy 
compared to unimproved single precision. 

In Figure | we show the average of the pion norm (11) as a function of k at 
jS = 0.8 on a 16^ x 32 lattice. For k < 0.214 we observe a very good agreement 
between BiCGstabl and CG. We checked that these data are also well consistent 
with CGDP data, which are not shown in this K-region. For k > 0.214 BiCGstabl 
with single precision did not converge, in contrast to the double precision case at this 
value of (3. For all inversion methods, statistical averages and errorbars become 
ill-defined due to 'exceptional' configurations close to i^dP) ^Jl- This effect is 
observable already at k = 0.214 and especially at k = 0.215 . 

Tables Q to ^ summarize our results concerning the reliability of the inverter 
methods on the QH2 and on the double precision machines in the confinement 
and Coulomb phases. In the confinement phase single precision BiCGstabl and 
BiCGstab2 were observed to fail at smaller values of k compared to the runs with 
double precision arithmetic, and therefore are unsuitable to explore the 'critical' 
region. On the other hand, in the Coulomb phase BiCGstabl on the QH2 proved 
to work even for k = Kc{i3). 

To quantify the results of Tables [1|, ^ we show in Figure ^ the failure rates of 
the single and double precision BiCGstabl implementations in dependence on rrig 
obtained from 210 gauge configurations. We allowed the fermionic observables to 
deviate by a certain cutoff from the corresponding CGDP result on each configu- 
ration. The failure rates turned out to be almost insensitive to the special choice 
of the cutoff in both, single and double precision cases, when it was varied from 
0.5% to 40%. The results confirm, that the failure rate is drastically increased when 
single precision is used. It suggests also, that the double precision version of BiCG- 
stabl may be applied very close to Hc in the confinement phase. For instance, at 
(3 = 0.8 the failure rate of the double precision BiCGstabl has been found to be 
zero for all investigated values of k. 

4 Efficiency 

In the confinement phase the double precision BiCGstabl demonstrates a pro- 
nounced superiority over CG and BiCGstab2 up to values of k rather close to 
Kc{l3). As an example, we show in Figure [1^ the inversion history for CG and Bi- 
CGstabl at /3 = 0.8 and n = 0.2170 {k^ = 0.2171(1)) on an 8^ x 16 lattice in 
double precision mode. Both methods were applied to the same configuration. The 
data points for 1 1 1 1 and 1 1 1 1 are on top of each other for both methods. 
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Figure 7: Normalized and Hi?*™*^]] vs. inversion iterations for the 

BiCGstabl method on the QH2 and the corresponding evolution of 11 . 
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Table 1: Reliability of inverters on 16^ x 32 (QH2) and 8^ x 16 lattices (double 
precision machine). Symbol ^ denotes correct fermionic observables; Q indicates 
a failure of the method. A question mark means that no conclusion could be drawn. 
Kc{P) is given by boldface. 
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Figure 9: The failure rate (in %) of BiCGstabl for (3 = and (3 = 0.8 in double 
and single precision modes, niq is defined in eq.(|). 
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Confinement phase f3 = 
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Table 2: Same as Table |l] but at /3 = 0. 



The pion norm 11 reaches a plateau before the stopping criterion is fulfilled. As far 
as CG and BiCGstabl require the same number of matrix multiplications per iter- 
ation their CPU time requirement per iteration is about the same. Viewed in this 
context, the small number of BiCGstabl iterations compared to CG is impressive. 

To quantify this point we compared the average CPU-times tcpu per inversion 
between the different methods. In the case of BiCGstab2 we compared tcpu for 
10 configurations to the CPU-time required by BiCGstabl on the same set of 
configurations including the same initial conditions. It turned out, that in the 
confinement phase at /? = 0.8 BiCGstab2 is by a factor ~ 0.6 -h 0.9 less 
efficient than BiCGstabl, depending on the choice of k. In Figure |ll] we display 
the comparative efficiency, i.e. the ratios t^pu/^cpu'^'''^"'^^ ^"^^ ^c^/^cpu'^''''''^'^ at 
j3 = 0.8 in dependence on k. The number of measurements for BiCGstabl on the 
double precision machine has been 200, while on the QH2 it ranged from ~ 80-i-800. 

BiCGstabl yields a considerable gain in CPU time by a factor of ~ 3 5 on 
the QII2. This factor is supported by the double precision data and is comparable 
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Coulomb phase (3 = 1.1 
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Table 3: Same as Tables |T], ^ but in the Coulomb phase at (3 = 1.1. 



with findings from quenched QCD tests Note, that the comparative efficiency 
is rather stable with respect to the lattice size, since Figure [Tl| displays data from 
8^ X 16 and 16^ x 32 lattices. In the case of /3 = the same picture holds 
qualitatively, however the maximal CPU time improvement factor was around 9 on 
double precision vector machines. 

As mentioned, BiCGstab2, if reliable, was always less efficient than BiCGstabl 
but a factor of ~ 2 3 better than CG. 

Things change dramatically in the Coulomb phase (see Figure |T^). The com- 
parative efficiency of BiCGstabl displays a sudden drop around Kc, the trend for 
BiCGstab2 is the same. Sufficiently below Kc the maximal factor of CPU time 
improvement is below 2 and goes to zero for /t — > . BiCGstab2 has always 

found to be less efficient than BiCGstabl. 

Presumably, the completely different distributions of eigenvalues of the fermionic 
matrix in the confinement and Coulomb phases cause this significantly different 
behavior of the inversion algorithms. 



5 Summary and conclusions 

To summarize, we studied the reliability and comparative efficiency of the three 
matrix inversion methods : CG, BiCGstabl and BiCGstab2. For this study we 
employed the U{1) lattice gauge theory with Wilson fermions, the inverted matrix 
being an even-odd decomposed version of the Wilson fermionic matrix M. (or 
M.'^ M). The main accent in our study was made on the 'critical' case k ~ ^dP)- 
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iteration 



Figure 10: Convergence history for the double precision CG and BiCGstabl at 
p = 0.8 and k = 0.2170. 
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Confinement phase: p=0.8 
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Figure 11: Comparative efficiency of BiCGstabI and BiCGstab2 for different /t's 
in the confinement phase at (3 = 0.8. The vertical line indicates the position of Kc- 

23 



2.0 



o 

o 
"o 



1.8 



1.6 



1.4 



1.2 



A- 



\ 



Coulomb phase : p=1 .1 



CD 

§ 1.0 
■Jo 

(0 
Q. 

i 0.8 
o 



0.6 



\ \ 



\ 



\ \ ' 

\ w 



• • BiCGstabI vs CG (DP) : 8 x1 6 
A BiCGstab2 vs CG (DP) : 8^1 6 

♦ ♦BiCGstabI vs CG (QH2): 16^32 



0.4 



0.2 



0.0 



K 



c 



0.120 0.125 0.130 0.135 0.140 0.145 



K 



Figure 12: Same as Figure |TT]but in the Coulomb phase at (3 = 1.1. 
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The runs have been performed in double precision implementation (Cray, Convex, 
Siemens-Fujitsu) and in single precision implementation (Convex and especially 
QH2). 

Our main conclusions are as follows. 

• The standard CG with double precision appears to be the most reliable 
method, though not the fastest one. No failures were observed even at 

K ~ /tc(/9). 

The CG with single precision in the Coulomb phase, i.e., /3 > /3o — 1.01 , 
is very reliable and reproduces the double precision results with reasonable 
accuracy even at k — Kc{P). 

In the confinement phase {(5 < Pq), however, the question of the reliabily of 
the CGSP in the k, — > ndP) hmit still remains open. 

• The rehability of the double precision BiCGstabl is rather high when k, ~ 
KciP)- The failure rate is zero a,t P — 0.8 , and about 2% in the extreme case 
j3 = 0. Given some precautions for a fatal case, e.g. restarting or switching 
to another inverter, it can be used to study the limit k Kc(/?)- 

In the case of single precision in the confinement phase the reliability of the 
BiCGstabl drastically drops in the k —>■ Kc{i3) limit. 

• In the confinement phase in the re-region, where a comparison was possible, 
BiCGstabl turned out to be the most efficient algorithm. The gain in CPU 
time was a factor of ~ 2.5 5 as compared to CG at (3 = 0.8, and up to a 
factor of --^ 9 in the extreme strong coupling regime (3 = 0. 

This does not hold for the Coulomb phase, where BiCGstabl (as well as Bi- 
CGstab2) exhibits a sudden drop in efficiency around Kc{P) and CG becomes 
the fastest method. 

• BiCGstab2 even with double precision shows rather low reliability in the 'crit- 
ical' region k ~ Kc- This fact practically excludes this method from the study 
of the chiral limit of the theory. 

In the parameter range, where BiCGstab2 worked, it was always less efficient 
than BiCGstabl (by a factor ~ 0.6 ^ 0.9). 

The completely different behavior of the comparative efficiency found in the 
confinement and Coulomb phases suggests, that the optimal choice of the inversion 
algorithm is dictated by the typical distribution of eigenvalues in the particular 
phase. Further investigation is needed to work out this point. 

We expect that most of the conclusions drawn in this paper can be applied also 
to lattice QCD. 
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